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Abstract 

We introduce an abductive method for a coherent integration of independent data- 
sources. The idea is to compute a hst of data-facts that should be inserted to the amal- 
gamated database or retracted from it in order to restore its consistency. This method 
is implemented by an abductive solver, called ^system, that applies SLDNFA-resolution 
on a meta-theory that relates different, possibly contradicting, input databases. We also 
give a pure model-theoretic analysis of the possible ways to 'recover' consistent data from 
an inconsistent database in terms of those models of the database that exhibit as minimal 
inconsistent information as reasonably possible. This allows us to characterize the 'recov- 
ered databases' in terms of the 'preferred' (i.e., most consistent) models of the theory. The 
outcome is an abductive-based application that is sound and complete with respect to a 
corresponding model-based, preferential semantics, and - to the best of our knowledge - is 
more expressive (thus more general) than any other implementation of coherent integration 
of databases. 

1. Introduction 

Complex reasoning tasks often have to integrate information that is coming from different 
sources. One of the major challenges with this respect is to compose contradicting sources of 
information such that what is obtained would properly reflect the combination of the data- 
sources on one hand^, and would still be coherent (in terms of consistency) on the other 
hand. There are a number of different issues involved in this process, the most important 
of which are the following: 

1. Unification of the different ontologies and/or database schemas, in order to get a fixed 
(global) schema, and a translation of the integrity constraints^ of each database to 
the new ontology. 

2. Unification of translated integrity constraints in a single global set of integrity con- 
straints. This means, in particular, elimination of contradictions among the translated 



1. This property is sometimes called compositionality (Verbaeten, Denecker, & De Schreye, 1997, 2000). 

2. I.e., the rules that represent intentional truths of a database domain. 
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integrity constraints, and inclusion of any global integrity constraint that is imposed 
on the integration process. 

3. Integration of databases w.r.t. the unified set of integrity constraints, computed ac- 
cording to the previous item. 

Each one of the issues mentioned above has its own difficulties and challenges. For 
instance, the first issue is considered, e.g., by Ullman (2000) and Lenzerini (2001, 2002), 
where questions such as how to express the relations between the 'global database schema' 
and the source (local) schemas, and how this influences query processing with respect to 
the global schema (Bertossi, Chomicki, Cortes, &: Gutierrez, 2002), are dealt with^. 

The second issue above is concerned with the construction of a single, classically con- 
sistent, set of integrity constraints, applied on the integrated data. In database context, it 
is common to assume that such a set is pre-defined, and consists of global integrity con- 
straints that are imposed on the integration process itself (Bertossi et al., 2002; Lenzerini, 
2002). In such case there is no need to derive these constraints from the local databases. 
When different integrity constraints are specified in different local databases, it is required 
to integrate not only the database instances (as specified in issue 3 above), but also the 
integrity constraints themselves (issue 2) . The reason for separating these two topics is that 
integrity constraints represent truths that should be valid in all situations, while a database 
instance exhibits an extensional truth, i.e., an actual situation. Consequently, the policy of 
resolving contradictions among integrity constraints is often different than the one that is 
applied on database facts, and often the former is applied before the latter. 

Despite their different nature, both issues are based on some formalisms that main- 
tain contradictions and allow to draw plausible conclusions from inconsistent situations. 
Roughly, there are two approaches to handle this problem: 

• Paraconsistent formalisms, in which the amalgamated data may remain inconsistent, 
but the set of conclusions implied by it is not explosive, i.e.: not every fact follows 
from an inconsistent database, and so the inference process does not become trivial 
in the presence of contradictions. Paraconsistent procedures for integrating data, like 
those of Subrahmanian (1994) and de Amo, Carnielli, and Marcos (2002), are often 
based on a paraconsistent reasoning systems, such as LFI (Carnielli & Marcos, 2001), 
annotated logics (Subrahmanian, 1990; Kifer & Lozinskii, 1992; Arenas, Bertossi, &: 
Kifer, 2000), or other non-classical proof procedures (Priest, 1991; Arieli & Avron, 
1996; Avron, 2002; Carniein & Marcos, 2002)"^. 

• Coherent (consistency-based) methods, in which the amalgamated data is revised in 
order to restore consistency (see, e.g., Baral, Kraus, & Minker, 1991; Baral, Kraus, 
Minker, & Subrahmanain, 1992; Benferhat, Dubois, &: Prade, 1995; Arenas, Bertossi, 



3. For surveys on schema matching and related aspects, see also (Batini, Lenzerini, & Navathe, 1986) and 
(Rahm & Bernstein, 2001). 

4. See also (Decker, 2003) for a historical perspective and some computational remarks on this kind of 
formalisms. 
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&: Chomicki, 1999; Arieli & Avron, 1999; Greco &: Zumpano, 2000; Liberatore & 
Schaerf, 2000; Bertossi & Schwind, 2002; Arieli, Denecker, Van Nuffelen, & Bruynooghe, 
2004). In many cases the underlying formalisms of these approaches are closely re- 
lated to the theory of belief revision (Alchourron, Gardenfors, & Makinson, 1995; 
Gardenfors & Rott, 1995). In the context of database systems the idea is to consider 
consistent databases that are 'as close as possible' to the original database. These 're- 
paired' instances of the 'spoiled' database correspond to plausible and compact ways 
of restoring consistency. 

In this paper we follow the latter approach, and consider abductive approaches that han- 
dle the third issue above, namely: coherent methods for integrating different data-sources 
(with the same ontology) w.r.t. a consistent set of integrity constraints^. The main diffi- 
culty in this process stems from the fact that even when each local database is consistent, 
the collective information of all the data-sources may not remain consistent anymore. In 
particular, facts that are specified in a particular database may violate some integrity con- 
straints defined elsewhere, and so this data might contradict some elements in the unified 
set of integrity constraints. Moreover, as noted e.g. in (Lenzerini, 2001; Call, Calvanese, 
De Giacomo, &: Lenzerini, 2002), the ability to handle, in a plausible way, incomplete and 
inconsistent data, is an inherent property of any system for data integration with integrity 
constrains, no matter which integration phase is considered. Providing proper ways of gain- 
ing this property is a major concern here as well. 

Our goal is therefore to find ways to properly 'repair' a combined (unified) database, 
and restore its consistency. For this, we consider a pure declarative representation of the 
composition of distributed data by a meta-theory, relating a number of different input 
databases (that may contradict each other) with a consistent output database. The un- 
derlying language of the theory is that of abductive logic programming (Kakas, Kowalski, 
&: Toni, 1992; Denecker & Kakas, 2000). For reasoning with such theories we use an ab- 
ductive system, called ^system (Kakas, Van Nuffelen, & Denecker, 2001; Van Nuffelen &: 
Kakas, 2001), which is an abductive solver implementing SLDNFA-resolution (Denecker & 
De Schreye, 1992, 1998). The composing system is implemented by abductive reasoning 
on the meta-theory. In the context of this work, we have extended this system with an 
optimizing component that allows us to compute preferred coherent ways to restore the 
consistency of a given database. The system that is obtained induces an operational seman- 
tics for database integration. In the sequel we also consider some model-theoretic aspects 
of the problem, and define a preferential semantics (Shoham, 1988) for it. According to 
this semantics, the repaired databases are characterized in terms of the preferred models 
(i.e., the most-consistent valuations) of the underlying theory. We relate these approaches 
by showing that the ^system is sound and complete w.r.t. the model-based semantics. It is 
also noted that our framework supports reasoning with various types of special information, 
such as timestamps and source identification. Some implementation issues and experimen- 



5. In this sense, one may view this work as a method for restoring the consistency of a single inconsistent 
database. We prefer, however, to treat it as an integration process of muhiple sources, since it also has 
some mediating capabilities, such as source identification, making priorities among different data-sources, 
etc. (see, e.g., Section 4.6). 
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tal results are discussed as well. 

The rest of this paper is organized as follows: in the next section we formally define 
our goal, namely: a coherent way to integrate different data-sources. In Section 3 we set 
up a semantics for this goal in terms of a corresponding model theory. Then, in Section 4 
we introduce our abductive-based application for database integration. This is the main 
section of this paper, in which we also describe how a given integration problem can be 
represented in terms of meta logic programs, show how to reason with these programs by 
abductive computational models, present some experimental results, consider proper ways 
of reasoning with several types of special data, and show that our application is sound and 
complete with respect to the model-based semantics, considered in Section 3. Section 5 
contains an overview of some related works, and in Section 6 we conclude with some further 
remarks, open issues, and future work^. 

2. Coherent Integration of Databases 

We begin with a formal definition of our goal. In this paper we assume that we have a 
first-order language L, based on a fixed database schema S, and a fixed domain D. Every 
element of D has a unique name. A database instance T) consists of atoms in the language 
L that are instances of the schema S. As such, every instance T> has a finite active domain, 
which is a subset of D. 

Definition 1 A database is a pair (I?, XC), where I? is a database instance, and ZC, the 
set of integrity constraints, is a finite and classically consistent set of formulae in L. 

Given a database VB = (T>, IC), we apply to it the closed word assumption, so only 
the facts that are explicitly mentioned in D are considered true. The underlying semantics 
corresponds, therefore, to minimal Herbrand interpretations. 

Definition 2 The minimal Herbrand model TiP of a database instance T> is the model of 
T) that assigns true to all the ground instances of atomic formulae in D, and false to all the 
other atoms. 

There are different views on a database. One view is that it is a logic theory consisting 
of atoms and, implicitly, the closed world assumption (CWA) that indicates that all atoms 
not in the database are false. Another common view of a database is that it is a structure 
that consists of a certain domain and corresponding relations, representing the state of the 
world. Whenever there is a complete knowledge and all true atoms are represented in the 
database, both views coincide: the unique Herbrand model of the theory is the intended 
structure. However, in the context of independent data-sources, the assumption that each 
local database represents the state of the world is obviously false. However, we can still 
view a local database as an incomplete theory, and so treating a database as a theory rather 
than as a structure is more appropriate in our case. 



This is a combined and extended version of (Arieli, Van Nuffelen, Denecker, & Bruynooghe, 2001) and 
(Arieli, Denecker, Van Nuffelen, & Bruynooghe, 2002). 
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Definition 3 A formula ip follows from a database instance "D (alternatively, "D entails ip; 
notation: T> \= ip) ii the minimal Herbrand model of V is also a model of ip. 

Definition 4 A database 'DB = (T>, IC) is consistent if every formula in ZC follows from T> 
(notation: T) \= ZC). 

Our goal is to integrate n consistent local databases, T>Bi = (Pj, ZCi) (i = 1, . . . n) to one 
consistent database that contains as much information as possible from the local databases. 
The idea, therefore, is to consider the union of the distributed data, and then to restore its 
consistency in such a way that as much information as possible will be preserved. 

Notation 1 Let T>Bi = {T>i,ZCi), i = l,...n, and let Z{ZCi, . . . ,ZCn) be a classically 
consistent set of integrity constraints. We denote: 



UVB=i\JV^,Z{ZCi,...,ZCn)). 



1=1 

In the notation above, Z is an operator that combines the integrity constraints and elim- 
inates contradictions (see, e.g., Alferes, Leite, Pereira, &; Quaresma, 2000; Alferes, Pereira, 
Przymusinska, &: Przymusinski, 2002). As we have already noted, how to choose this oper- 
ator and how to apply it on a specific database is beyond the scope of this paper. In cases 
that the union of all the integrity constraints is classically consistent, it makes sense to take 
Z as the union operator. Global consistency of the integrity constraints is indeed a common 
assumption in the database literature (Arenas et al., 1999; Greco & Zumpano, 2000; Greco, 
Greco, & Zumpano, 2001; Bertossi et al., 2002; Konieczny & Pino Perez, 2002; Lenzerini, 
2002), but for the discussion here it is possible to take, instead of the union, any operator 
Z for consistency restoration. 

A key notion in database integration is the following: 
Definition 5 A repair of a database T>B= (P, ZC) is a pair (Insert, Retract), such that: 

1. Insert ni:' = 0, 

2. Retract C V'^ , 

3. {T) U Insert \ Retract, ZC) is a consistent database. 

Intuitively, Insert is a set of elements that should be inserted into T> and Retract is a set 
of elements that should be removed from T> in order to have a consistent database. 

As noted above, repair of a given database is a key notion in many formalisms for 
data integration. In the context of database systems, this notion was first introduced by 
Arenas, Bertossi, and Chomicki (1999), and later considered by many others (e.g., Greco 
& Zumpano, 2000; Liberatore & Schaerf, 2000; Franconi, Palma, Leone, Perri, &: Scarcello, 
2001; Bertossi et al., 2002; Bertossi & Schwind, 2002; de Amo et al., 2002; Arenas, Bertossi, 
& Chomicki, 2003; Arieli et al., 2004). Earlier versions of repairs and inclusion-based 
consistency restoration may be traced back to Dalai (1988) and Winslett (1988). 



7. Note that by conditions (1) and (2) it follows that Insert n Retract = ( 
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Definition 6 A repaired database of T>B = (D, ZC) is a consistent database of tlie form 
(T> U Insert \ Retract , IC), wiiere (Insert, Retract) is a repair of VB. 

As there may be many ways to repair an inconsistent database^, it is often convenient 
to make preferences among the possible repairs, and consider only the most preferred ones. 
Below are two common preference criteria for preferring a repair (Insert, Retract) over a 
repair (Insert', Retract'): 

Definition 7 Let (Insert, Retract) and (Insert', Retract') be two repairs of a given database. 

• set inclusion preference criterion : (Insert', Retract') <j (Insert, Retract), if 
Insert C Insert' and Retract C Retract'. 

• minimal cardinality preference criterion: (Insert', Retract') <c (Insert, Retract), if 
|lnsert| + |Retract| < |lnsert'| + |Retract'|. 

Set inclusion is also considered in (Arenas et al., 1999; Greco &: Zumpano, 2000; Bertossi 
et al., 2002; Bertossi & Schwind, 2002; de Amo et al., 2002; Arenas et al., 2003; Arieh et al., 
2004, and others), minimal cardinality is considered, e.g., in (Dalai, 1988; Liberatore & 
Schaerf, 2000; Arenas et al., 2003; Arieh et al., 2004). 

In what follows we assume that the preference relation < is a fixed pre-order that 
represents some preference criterion on the set of repairs (and we shall omit subscript 
notations in it whenever possible). We shall also assume that if (0, 0) is a valid repair, it is 
the <-least (i.e., the 'best') one. This corresponds to the intuition that a database should 
not be repaired unless it is inconsistent. 

Definition 8 A <-preferred repair of "DB is a repair (Insert, Retract) of DB, s.t. for every 
repair (Insert', Retract') of VB, if (Insert, Retract) < (Insert', Retract') then (Insert', Retract') < 
(Insert, Retract). The set of all the <-preferred repairs of "DB is denoted by !(Pi3, <). 

Definition 9 A <-repaired database of "DB is a repaired database of "DB, constructed from 
a <-preferred repair of VB. The set of all the <-repaired databases is denoted by: 

n(VB, <) = {{VU Insert \ Retract , IC) \ (Insert, Retract) G 1{VB, <) }. 

Note that if VB is consistent and < is a preference relation, then VB is the only <- 
repaired database of itself (thus, there is nothing to repair in this case, as expected). 

Note 1 It is usual to refer to the <-preferred databases of VB as the consistent databases 
that are 'as close as possible' to VB itself (see, e.g.. Arenas, Bertossi, & Chomicki, 1999; 
Liberatore & Schaerf, 2000; de Amo, Carnielli, & Marcos, 2002; Konieczny & Pino Perez, 
2002; Arenas, Bertossi, & Chomicki, 2003; Arieli, Denecker, Van Nuffelen, & Bruynooghe, 
2004). Indeed, let 

dist(Pi, P2) = {Vi \ V2) U {V2 \ Vi). 



8. Some repairs may be trivial and/or useless, though. For instance, one way to eliminate the inconsistency 
in {T>, TC) — {{p, q, r}, {^p}) is by deleting every element in D, but this is certainly not the optimal way 
of restoring consistency in this case. 
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It is easy to see that VB' = {V',ZC) is a <j-repaired database of T>B = {V, IC), if the set 
dist(2?',2?) is minimal (w.r.t. set inclusion) among all the sets of the form dist(V" ,1)), 
where V" \= IC. Similarly, if 15*1 denotes the size of S, then VB' = {V ,IC) is a <c-repaired 
database oiVB = {V, IC), if |dist(P',P)| = min{|dist(Z)",P)| | V" \=IC}. 

Given n databases and a preference criterion <, our goal is therefore to compute the set 
TZ{UVB,<) of the <-repaired databases of the unified database, UVB (Notation 1). The 
reasoner may use different strategies to determine the consequences of this set. Among the 
common approaches are the skeptical (conservative) one, that it is based on a 'consensus' 
among all the elements of 71{UVB, <) (see Arenas et al., 1999; Greco & Zumpano, 2000), a 
'credulous' approach, in which entailments are determined by any element in TZ{U'DB, <), 
an approach that is based on a 'majority vote' (Lin & Mendelzon, 1998; Konieczny & 
Pino Perez, 2002), etc. In cases where processing time is a major consideration, one may 
want to speed-up the computations by considering any repaired database. In such cases it 
is sufficient to find an arbitrary element in the set TZ{WDB, <). 

Below are some examples^ of the integration process^''. 

Example 1 Consider a relation teaches of the schema (coursejiame, teacherjiame), and 
an integrity constraint, stating that the same course cannot be taught by two different 
teachers: 

IC = { yXMYMZ {teaches{X, Y) A teaches{X, Z) ^ Y = Z)}. 
Consider now the following two databases: 

VBi = {{teaches{ci,ni), teaches{c2,n2)}, IC), 

'DB2 = {{teaches{c2,ns)}, IC). 
Clearly, the unified database T>Bi L)'DB2 is inconsistent. It has two preferred repairs, which 
are (0, {teaches (02,713)}) and (0, {teaches{c2,n2)}). The corresponding repaired databases 
are the following: 

7?.i = {{teaches{ci,ni), teaches{c2,n2)}, IC), 

'^2 = {{teaches{ci,ni), teaches{c2,n3)}, IC). 
Thus, e.g., teaches{ci,ni) is true both in the conservative approach and the credulous 
approach to database integration, while the conclusion teaches {c2,n2) is supported only by 
credulous reasoning. 

Example 2 Consider databases with relations class and supply, of schemas (item, type) 
and (supplier, department, item), respectively. Let 

VBi = {{supply{ci,di,ii), class{ii,ti)}, IC), 

VB2 = {{supply{c2,d2,i2), class{i2,ti)}, 0), 

where IC = { yX\/Y\/Z {supply{X, Y, Z)/\class{Z, ti) ^ X = ci)} states that only supplier 



9. See, e.g., (Arenas et al., 1999; Greco & Zumpano, 2000; Bcrtossi & Schwind, 2002) for further discussions 
on these examples. 
10. In all the following examples we use set inclusion as the preference criterion, and take the operator T 
that combines integrity constraints (see Notation 1) to be the union operator. 
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ci can supply items of type ti. Again, DBi U T>B2 is inconsistent, and has two preferred 
repairs: (0, {supp/y(c2, 1^2,^2)}) and (0, {class(i2,ti)}). It follows that there are two repairs 
of this database: 

T^-i = { {supply {ci,di,ii), dass{ii,ti), class{i2,ti)}, IC), 

7^2 = i {supply {ci,di,ii), supply{c2,d2,i2), dass{ii,ti)}, IC). 

Example 3 Let Vi = {p{a), p{b)}, V2 = {q{a), q{c)}, and IC = {yX{p{X) ^q{X))}. Again, 
(2?i,0) U (T>2,IC) is inconsistent. The corresponding preferred repairs are {{q{b)}, 0) and 
(0, {p{b)}). Thus, the repaired databases are the following: 

^1 = {{P(.a), P(.b), q{a), q{b), q{c)}, IC), 

7^2 = {{p{a), q{a), q{c)}, IC). 
In this case, then, both the 'consensus approach' and the 'credulous approach' allow to 
infer, e.g., that p{a) holds, while p{b) is supported only by credulous reasoning, and p{c) is 
not supported by either of these approaches. 

3. Model-based Characterization of Repairs 

In this section we set up a semantics for describing repairs and preferred repairs in terms 
of a corresponding model theory. This will allow us, in particular, to give an alternative 
description of preferred repairs, this time in terms of a preferential semantics for database 
theory. 

As database semantics is usually defined in terms of two-valued (Herbrand) models 
(cf. Definition 2 and the discussion that proceeds it), it is natural to consider two- valued 
semantics first. We show that arbitrary repairs can be represented by two-valued models 
of the integrity constraints. When a database is inconsistent, then by definition, there 
is no two-valued interpretation which satisfies both its database instance and its integrity 
constraints. A standard way to cope with this type of inconsistencies is to move to multiple- 
valued semantics for reasoning with inconsistent and incomplete information (see, e.g., 
Subrahmanian, 1990, 1994; Messing, 1997; Arieli & Avron, 1999; Arenas, Bertossi, & Kifer, 
2000; de Amo, Carnielli, & Marcos, 2002). What we will show below, is that repairs can be 
characterized by three- valued models of the whole database, that is, of the database instance 
and the integrity constraints. Finally, we concentrate on the most preferred repairs, and 
show that a certain subset of the three-valued models can be used for characterizing <- 
preferred repairs. 

Definition 10 Given a valuation u and a truth value x. Denote: 

h'^ = {p \ p IS an atomic formula, and ^{p) = x}^^. 

The following two propositions characterize repairs in terms of two-valued structures. 

Proposition 1 Let (D, IC) be a database and let M be a two-valued model of IC. Let 
Insert = M* \ V and Retract = 2? \ M*. Then (Insert, Retract) is a repair of {V, IC). 



11. Note, in particular, that in terms of Definition 2, if u = Tl^ and x = t, we have that u^ = V. 
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Proof: The definitions of Insert and Retract immediately imply that Insert n P = and 
Retract C V. For the last condition in Definition 5, note that V U Insert \ Retract = T> U 
{M\V)\{V\M^) = MK It follows that M is the least Herbrand model of PU I nsert\ Retract 
and it is also a model of XC, therefore V U Insert \ Retract \= XC. □ 

Proposition 2 Let (Insert, Retract) he a repair of a database {T>,2C). Then there is a 
two-valued model M ofIC such that Insert = M* \ P and Retract = V\MK 

Proof: Consider a valuation M, defined for every atom p as follows: 

( t ifpGVU Insert \ Retract, 

M{p) = { "^ ^ ' 

I / otherwise. 

By its definition, M is a minimal Herbrand model of P U Insert \ Retract. Now, since 
(Insert, Retract) is a repair of (T>, IC), we have that V U Insert \ Retract \= XC, thus M 
is a (two-valued) model of TC. Moreover, since (Insert, Retract) is a repair, necessarily 
Insert n P = and Retract C V, hence we have the following: 

• M* \ D = (D U Insert \ Retract) \V = Insert, 

• P\M* = D\(DU Insert \ Retract) = Retract. □ 

When a database is inconsistent, it has no models that satisfy both its integrity con- 
straints and its database instance. One common method to overcome such an inconsistency 
is to introduce additional truth-values that intuitively represent partial knowledge, different 
amounts of beliefs, etc. (see, e.g., Priest, 1989, 1991; Subrahmanian, 1990; Fitting, 1991; 
Arieli, 1999; Arenas et al., 2000; Avron, 2002). Here we follow this guideline, and consider 
database integration in the context of a three-valued semantics. The benefit of this is that, 
as we show below, any database has some three-valued models, from which it is possible to 
pinpoint the inconsistent information, and accordingly construct repairs. 

The underlying three- valued semantics considered here is induced by the algebraic struc- 
ture TTCTZSS, shown in the double-Hasse diagram of Figure 1^^. 




Figure 1: The structure TTCTZSS 

Intuitively, the elements t and / in TTiTZSS correspond to the usual classical values 
true and false, while the third element, T, represents inconsistent information (or belief). 



12. This structure is used for reasoning with inconsistency by several other three-valued formalisms, such as 
LFI (Carnielh & Marcos, 2001, 2002) and LP (Priest, 1989, 1991). 
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Viewed horizontally, TH'RZE is a complete lattice. According to this view, / is the minimal 
element, t is the maximal one, and T is an intermediate element. The corresponding or- 
der relation, <t, intuitively represents differences in the amount of truth that each element 
exhibits. We denote the meet, join, and the order reversing operation on <i by A, V, and 
-1 (respectively). Viewed vertically, TTCTZSS is a semi-upper lattice. In this view, T is 
the maximal element and the two 'classical values' are incomparable. This partial order, 
denoted by <k, may be intuitively understood as representing differences in the amount 
of knowledge (or information) that each element represents^^. We denote by © the join 
operation on <fc^^. 

Various semantic notions can be defined on TTiTZSS as natural generalizations of similar 
classical ones: a valuation zv is a function that assigns a truth value in TTCTZSS to each 
atomic formula. Given a valuation v, truth values Xi £ {t,f,T}, and atomic formulae pi, 
we shall sometimes write i' = {pi : Xi} instead of i^(pj) = Xj (i = 1, 2 . . .). Any valuation is 
extended to complex formulae in the obvious way. For instance, i'{^tp) = -'//(-(/'), v{'tp /\(j)) = 
v{'ip) A i^(V')) and so forth"*^^. The set of the designated truth values in TTCTZSS (i.e., those 
elements in TTCTZSS that represent true assertions) consists of t and T. A valuation i' 
satisfies a formula ip iff I'^ip) is designated. A valuation that assigns a designated value to 
every formula in a theory T is a (three-valued) model of T. 

Lemma 1 Let v and /j, be three-valued valuations s.t. for every atom p, v{p) >A;/^(p). Then 
for every formula ip, z^(^) >A;/x(V')- 

Proof: By induction on the structure oi ip. □ 

We shall write j^>a;/U, if i" and /x are three-valued valuations, for which the condition of 
Lemma 1 holds. 

Lemma 2 If i'>kfJ' cind fi is a model of some theory T, then v is also a model of T . 

Proof: For every formula tp £ T, fi^ip) is designated. Hence, by Lemma 1, for every formula 
il) £T h'{ip) is also designated, and so i/ is a model of T. □ 

Next we characterize the repairs of a database DB by its three-valued models: 

Proposition 3 Let {V, ZC) be a database and let M be a two-valued model ofZC. Consider 
the three-valued valuation N, defined for every atom p by N{p) ='liP{p) © M{p), and let 
Insert = N~^ \V, Retract = N^ nV. Then: 

1. N is a three-valued model ofVUlC, and 



13. See (Belnap, 1977; Ginsberg, 1988; Fitting, 1991) for a more detailed discussion on these orders and 
their intuitive meaning. 

14. We follow here the notations of Fitting (1990, 1991). 

15. As usual, we use here the same logical symbol to denote the connective that appear on the left-hand 
side of an equation, and the corresponding operator on TTiTZEE that appear on the right-hand side of 
the same equation. 
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2. (Insert, Retract) is a repair of (P, TC). 

Proposition 3 shows that repairs of a database (P, ZC) may be constructed in a standard 
(uniform) way by considering three-valued models that are the <fc-least upper bounds of 
two two-valued valuations: the minimal Herbrand model of the database instance, and a 
two-valued model of the integrity constraints. Proposition 4 below shows that any repair 
of {T>, TC) is of this form. 

Before we give a proof for Proposition 3, let's demonstrate it by a simple example. 

Example 4 Let VB = {{p,r} , {p — > q}). Then HP = {p:t, q:f, r:t}, and the two-valued 
models of 2:C= {p ^ q} are {p:t, q:t, r:t}, {p:t, q:t, r:f}, {p:f,q:t, r:t}, {p:f, q:t,r:f}, 
{P'-f^ Q'-fi ^-i}, and {p:f, q:f, r: f}. Thus, the (three- valued) models of the form TiP ®M, 
where M is a two- valued model oiXC, are {p:t, g:T, r :i}, {p:t, g:T, r :T}, {p:T, g:T, r :t}, 
{p : T, g : T, r : T}, {p : T, g : /, r : t} and {p : T, g : /, r : T}. By Proposition 3, the pairs 
({9}>{})) {{<i]i{r]), ({g},M), {{q],{p,r]), ({},M), and ({},{p,r}), are repairs of PK 

Proof of Proposition 3: Since by the definition of A^, N >k TiP , and since TiP is a model 
of T>, Lemma 2 implies that A^ is a model T>. Similarly, N >k M, and M is a model of IC, 
thus by the same lemma N is also a model of IC. 

For the second part, we observe that Insert = N^ \'D = M^\V and Retract = N"^ r\T> = 
M^ r\T> = T) \ M*. Now, M is a two- valued model of IC and hence, by Proposition 1, 
(Insert, Retract) is a repair of {V, IC). □ 

Note that the specific form of the three-valued valuations considered in Proposition 3 
is essential here, as the proposition does not hold for every three- valued model of "D UlC. 
To see this consider, e.g., V = {}, IC = {p , p^^q}, and a three valued valuation A^ that 
assigns T to p and t to q. Clearly, A^ is a model of P U IC, but the corresponding update, 
(A^^ \ D , A^^ nD) = {{p}, {}) is not a repair of (T>, IC), since {{p},IC) is not a consistent 
database. 

Again, as we have noted above, it is possible to show that the converse of Proposition 3 
is also true: 

Proposition 4 Let (Insert, Retract) be a repair of a database {T>,IC). Then there is a 
three-valued model NofV UlC, such that Insert = A^^ \ V and Retract = A^^ n V. 

Proof: Consider a valuation A^, defined for every atom p as follows: 

{T if pG Insert U Retract, 
t if p0 Insert U Retract but pGD, 

/ otherwise. 

By the definition of A^ and since (Insert, Retract) is a repair of {D, IC), we have that N \D = 
(Insert U Retract) \V = Insert and N^ nV = (Insert U Retract) n P = Retract. 

It remains to show that A^ is a (three-valued) model of "D and IC. It is a three-valued 
model of V because for every p ^T>, N{p) G {t,T}. Regarding IC, (Insert, Retract) is a 
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repair of (D, 2C), thus every formula in 2C is true in the least Herbrand model M of 
D' = V L) Insert \ Retract. In particular, M{q) = t for every q E D' . But since for every 
pGDUlnsert wehavethat iV(p)G{t,T} and P' C Dulnsert, necessarily ^q eV N{q)e{t,T}. 
It follows that for every q^V , N{q) >kM(q) = t, thus by Lemma 1 and Lemma 2, N must 
also be a (three-valued) model of D' . Hence A^ is a model of ZC. □ 

The last two propositions characterize the repairs of WDB in terms of pairs that are 
associated with certain three-valued models of P U IC. We shall denote the elements of 
these pairs as follows: 

Notation 2 Let iV be a three-valued model and let DB = {V, ZC) be a database. Denote: 
Insert^ = AtT \ p and Retract^ = A^T ^ ^^ 

We conclude this model-based analysis by characterizing the set of the <-preferred 
repairs, where < is one of the preference criteria considered in Definition 7 (i.e., set inclusion 
or minimal cardinality). As the propositions below show, common considerations on how 
inconsistent databases can be 'properly' recovered (e.g., keeping the amount of changes 
as minimal as possible, being 'as close as possible' to the original instance, etc.) can be 
captured by preferential models in the context of preferential semantics (Shoham, 1988). 
The idea is to define some order relation on the set of the (three-valued) models of the 
database. This relation intuitively captures some criterion for making preferences among 
the relevant models. Then, only the 'most preferred' models (those that are minimal with 
respect to the underlying order relation) are considered in order to determine how the 
database should be repaired. Below we formalize this idea: 

Definition 11 Given a database DB = (P, ZC), denote: 

M^'^ = {N \N>kl-P ®M for some classical model M of ZCY^ . 

Example 5 Consider again the database T>B = ({p, r} , {p -^ q}) of Example 4. As we 
have shown, there are six valuations of the form TiP © M, for some two- valued model M of 
ZC, namely: 

{p:t , q:T, r:t}, {p:t , g:T, r :T}, {p:T, g:T, r:t}, 
{p:T, g:T, r:T}, {p:T, q:f, r:t}, {p:T , q:f, r:T}. 

The /c-minimal models among these models are {p:t, q:T, r:t} and {p-T, q:f, r:t}, thus 
M'^^ = {N \ N{p) >k t, N{q) = T, N{r) >k t} U {A^ | N{p) = T,N{q) >k f, N{r) >k t}. 

Preference orders should reflect some normality considerations applied on the relevant 
set of valuations {A4 , in our case); v is preferable than fi, if u describes a situation 
that is more common (plausible) than the one described by /x. Hence, a natural way to 
define preferences in our case is by minimizing inconsistencies. We thus get the following 
definition: 

Definition 12 Let 5 be a set of three- valued valuations, and Ni,N2&S. 



16. Note that A'^ is a three-valued valuation and M is a two-valued model of JC. 
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• -^1 is <i-more consistent than N2, if A^i C NJ ■ 

• -^1 is <c-niore consistent than A'^2) if l-^i I < 1-^2 I- 

• N&S is <i-maxiniaUy consistent in S (respectively, N is <c- maximally consistent in 
5), if there is no N'gS that is <j-more consistent than N (respectively, no N' &S is 
<c-more consistent than A^). 

The following propositions show that there is a close relationship between most consis- 
tent models of M^'^ and the preferred repairs of VB. 

Propositions If N is a <i-niaxi7naUy consistent element in Ai , then {\nsert , Retract ) 
is a <i-preferred repair ofDB. 

Proof: By Proposition 3, (Insert^, Retract^) is a repair of VB. If it is not a <j-preferred 
repair of DS, then there is a repair (Insert, Retract) s.t. Insert C Insert^, Retract C Retract^, 
and I nsertU Retract C Insert U Retract . By Proposition 4 and its proof, there is an element 
N'eM'^'^ s.t. Insert = Insert^', Retract = Retract^', and {N'^ = Insert^' U Retract^'. It 
follows, then, that {N')'^ C A^""", and so N is not a maximally consistent in Ai^'^, but this 
is a contradiction to the definition of N. □ 

Proposition 6 Suppose that (Insert, Retract) is a <i-preferred repair ofDB. Then there is 
a <i-maximally consistent element N in J\A s.t. Insert = Insert and Retract = Retract . 

Proof: The pair (Insert, Retract) is in particular a repair oiT>B, thus by Proposition 2 there 
is a classical model M of IC such that Insert = M* \ V and Retract = 'D\ M*. Consider the 
following valuation: 

JT ifpGM*\P orpGD\M* 

1 M{p) otherwise. 

First we show that N = TiP © M. This is so since if M{p) = TiP{p), then since TiP is a 
minimal Herbrand model of D, necessarily p^M* \ T) and p^T>\ M*, thus N{p) =M{p) = 
M{p) © M{p) = M{p) © T-P{p). Otherwise, if M{p) / H^(p), then either M{p) = t and 
T-P{p) = f, i.e., p G M* \ D, or M{p) = f and H^ip) = t, i.e., p G 2? \ M*. In both cases, 
N{p) = T = M{p) © 1-P{pY\ Thus N = rP ®M, and so A^ G M^^ . Now, by Proposition 2 
again, and by the definition of iV, Insert^ = N'^ \V = [(M*\D) U (P\M*)] \P = M*\D = 
Insert, and Retract^ = jyT n D = [(M* \ D) U (P \ M*)] n D = D \ M* = Retract. 

It remains to show that N is <j-maximally consistent in M^'^ . Suppose not. Then there 
is an N'gM^'^ s.t. {N')'^ C N'^ = Insert U Retract. By Proposition 3, (Insert^', Retract^') 
is also a repair of DB. Moreover, 

• Insert^' = {N'^ \VCN^ \V = Insert^ = Insert, 

• Retract^' = {N'^ nVCN^ nV = Retract^ = Retract, 

• Insert^' U Retract^' = {N')^ C N^ = Insert^ U Retract^ = Insert U Retract. 

Hence (Insert^ , Retract^ ) <« (Insert, Retract), and so (Insert, Retract) is not a <j-preferred 
repair of (D, IC), a contradiction. □ 

Propositions 5 and 6 may be formulated in terms of <c as follows: 



17. Here we use the fact that i © / = T. 
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Proposition 7 If N is a <c-maximally consistent element in M , then {\nsert , Retract ) 
is a <c-preferred repair ofDB. 

Proposition 8 Suppose that (Insert, Retract) is a <c-preferred repair ofDB. Then there is 
a <c-niaximally consistent element N in M^^ s.t. Insert = Insert and Retract = Retract . 

The proofs of the last two propositions are similar to those of Propositions 5 and 6, 
respectively. 

Example 6 Consider again Example 3. We have that: 

UVB = {V, IC) = {{p{a), p{b), q{a), q{c)}, {\fX{p{X)^q{X))}). 

Thus, TC^ = {p{a) : t, p{b) : t, p{c) : f, q{a) : t, q{b) : /, q{c) : t}, and the classical models of 
IC are those in which either p{y) is false or q(y) is true for every |/€ {a, b, c}. Now, since in 
TiP neither p{b) is false nor q{b) is true, it follows that every element in M^^^ must assign 
T either to p{b) or to q{b). Hence, the <j-maximally consistent elements in 7V4 (which 

in this case are also the <c-maximally consistent elements in A4 ) are the following: 
Ml = { p{a) : t, p{b) : T, p{c) : /, q{a) : t, q{b) : /, q{c) : t }, 
M2 = { p{a) : t, p{b) : t, p(c) : /, q{a) : t, q{b) : T, q{c) : t }. 

By Propositions 5 and 6, then, the <i-preferred repairs of UVB (which are also its <c- 
preferred repairs) are (Insert^^, Retract^^) = (0, {p{b)}) and (Insert^^, Retract^^) = 
({g(6)}, 0) (cf. Example 3). 

Example 7 In Examples 4 and 5, the <j-maximally consistent elements (and the <c- 
maximally consistent elements) of A^ " are A'^i = {p : t, g : T, r : f } and N2 = {p : T, g : /, r : t}. 
It follows that the preferred repairs in this case are ({g}, 0) and (0, {p}). 

To summarize, in this section we have considered a model-based, three-valued preferen- 
tial semantics for database integration. We have shown (Propositions 5-8) that common 
and natural criteria for making preferences among possible repairs (i.e., set inclusion and 
minimal cardinality) can be expressed by order relations on three-valued models of the 
database. The two ways of making preferences (among repairs on one hand and among 
three-valued models on the other hand) are thus strongly related, and induce two alter- 
native approaches for database integration. In the next section we shall consider a third 
approach to the same problem (aimed to provide an operational semantics for database 
integration) and relate it to the model-based semantics, discussed above. 

4. Computing Repairs through Abduction 

In this section we introduce an abductive system that consistently integrates possibly con- 
tradicting data-sources. This system computes, for a set of data-sources and a preference 
criterion <, the corresponding <-repaired databases^^. Our framework is composed of 
an abductive logic program (Denecker & Kakas, 2000) and an abductive solver ^system 
(Kakas, Van Nuffelen, & Denecker, 2001; Van Nuffelen &: Kakas, 2001) that is based on the 



18. It is important to note already in this stage that for computing the <-repaired databases it won't be 
necessary to produce all the repaired databases. 

258 



Coherent integration of databases by abductive logic programming 



abductive refutation procedure SLDNFA (Denecker & De Schreye, 1992, 1998). In the first 
three parts of this section we describe these components: in Section 4.1 we give a general 
description of abductive reasoning, in Section 4.2 we show how it can be apphed to encode 
database repairs, and in Section 4.3 we describe the 'computational platform'. Then, in 
Section 4.4 we demonstrate the computation process by a comprehensive example, and in 
Section 4.5 we specify soundness and completeness results of our approach (with respect to 
the basic definitions of Section 2 and the model-based semantics of Section 3). Finally, in 
Section 4.6 we consider some ways of representing special types of data in the system. 

4.1 Abductive Logic Programming 

We start with a general description of abductive reasoning in the context of logic pro- 
gramming. As usual in logic programming, the language contains constants, functions, and 
predicate symbols. A term is either a variable, a constant, or a compound term /(ti, . . . , tn), 
where / is an n-ary function symbol and ti are terms. An atom is an expression of the form 
p{ti, . . . , tm), where p is an m-ary predicate symbol and ti {i= 1,. . .,m) are terms. A literal 
is an atom or a negated atom. A denial is an expression of the form VX(^ F), where F 
is a conjunction of literals and X is a subset of the variables in F. The free variables in 
F (those that are not in X) can be considered as place holders for objects of unspecified 
identity (Skolem constants). Intuitively, the body F of a denial \/X{^— F) represents an 
invalid situation. 

Definition 13 (Kakas et al., 1992; Denecker & Kakas, 2000) An abductive logic theory is 
a triple T = {V , A , FC), where: 

• "P is a logic program, consisting of clauses of the form /i ^ /i A . . . A /„, where h is 
an atomic formula and li {i = 1, . . . ,n) are literals. These clauses are interpreted as 
definitions for the predicates in their heads, 

• ^ is a set of ahducihle predicates, i.e., predicates that do not appear in the head of 
any clause in V, 

• IC is a set of first-order formulae, called integrity constraints. 

All the main model semantics of logic programming can be extended to abductive logic 
programming. This includes two-valued completion (Console, Theseider Dupre, & Torasso, 
1991) and three- valued completion semantics (Denecker & De Schreye, 1993), extended 
well-founded semantics (Pereira, Aparicio, & Alferes, 1991), and generalized stable seman- 
tics (Kakas & Mancarella, 1990b). These semantics can be defined in terms of arbitrary 
interpretations (Denecker & De Schreye, 1993), but generally they are based on Herbrand 
interpretations. The effect of this restriction on the semantics of the abductive theory is 
that a domain closure condition is imposed: the domain of interpretation is known to be 
the Herbrand universe. A model of an abductive theory under any of these semantics is 
a Herbrand interpretation Ti, for which there exists a collection of ground abducible facts 
A, such that H is a model of the logic program 'P U A (with respect to the corresponding 
semantics of logic programming) and 7i classically satisfies any element in ZC. 
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Similarly, for any of the main semantics S of logic programming, one can define the 
notion of an abductive solution for a query and an abductive theory. 

Definition 14 (Kakas et al., 1992; Denecker & Kakas, 2000) An (abductive) solution for 
a theory {V , A, IC) and a query Q is a set A of ground abducible atoms, each one having 
a predicate symbol in A, together with an answer substitution 9, such that the following 
three conditions are satisfied: 

a) ■p U A is consistent in the semantics S, 

b) T' U A 1=5 IC, 

c) vuA^s yQO. 

In the next section we will use an abductive theory with a non-recursive program to 
model the database repairs. The next proposition shows that for such abductive theories all 
Herbrand semantics coincide, and models correspond to abductive solutions for the query 
true. 

Proposition 9 Let T = (V , A, IC) be an abductive theory, such that V is a non-recursive 
program. Then Ti is a Herbrand model of the three-valued completion semantics, ijfTC is a 
Herbrand model of the two-valued completion semantics, iff Tl is a generalized stable model 
of T, iff 7i is a generalized well-founded model of I . 

If "H is a model of I , then the set A of abducible atoms in 7i is an abductive solution 
for the query true. Conversely, for every abductive solution for true, there exists a unique 
model 7i of T , such that A is the set of true abducible atoms in 7i. 

Proof: The proof is based on the well-known fact that for non-recursive logic programs, all 
the main semantics of logic programming coincide. In particular, for a non-recursive logic 
program V there is a Herbrand interpretation Ti, which is the unique model under each 
semantics (see, for example, Denecker &: De Schreye, 1993). 

Let 7^ be a model of T = ("P , ^ , IC) under any of the four semantics mentioned above. 
Then there exists a collection of ground abducible facts A, such that 7^ is a model of the 
logic program "P U A under the corresponding semantics of logic programming. Since V is 
non-recursive, so is PU A. By the above observation, 7i is the unique model of PuA under 
any of the above mentioned semantics. Hence, 7i \s a, model of I under any of the other 
semantics. This proves the first part of the proposition. 

When H is a Herbrand model of T, there is a set A of abducible atoms such that 7i is 
a model of P U A. Clearly, A must be the set of true abducible atoms in Ti. Then P U A 
is obviously consistent, and it entails the integrity constraints of T, which entails true. 
Hence, A is an abductive solution for true. Conversely, for any set A of abducible atoms, 
PuA has a unique model Ti and the set of true abducible atoms in 7i is A. When A is an 
abductive solution for true, 7i satisfies the integrity constraints, and hence 7i \s a, model 
of I. Consequently, 7i is the unique model of T, and its set of true abducible atoms is A. □ 

In addition to the standard properties of abductive solutions for a theory T and a query 
Q, specified in Definition 14, one frequently imposes optimization conditions on the solutions 
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A, analogous to those found in the context of database repairs. Two frequently used criteria 
are that the generated abductive solution A should be minimal with respect to set inclusion 
or with respect to set cardinality (cf. Definition 7). The fact that the same preference 
criteria are used for choosing appropriate abductive solutions and for selecting preferred 
database repairs does not necessarily mean that there is a natural mapping between the 
corresponding solutions. In the next sections we will show, however, that meta-programming 
allows us to map a database repair problem into an abductive problem (w.r.t. the same 
type of preference criterion) . 

4.2 An Abductive Meta-Program for Encoding Database Repairs 

The task of repairing the union of n given databases "DBi with respect to the integra- 
tion of the local integrity constraints IC, can be represented by an abductive theory 
T = {V , A , iC), where "P is a meta-program encoding how a new database is obtained 
by updating the existing databases, A is the set {insert, retract} of abducible predicates 
used to describe updates, and IC' encodes the integrity constraints. In V, facts p that 
appear in at least one of the databases are encoded by atomic rules db(p), and facts p that 
appear in the updated database are represented by atoms fact(p). The latter predicate is 
defined as follows: 

fact(X) ^ db(X) A -.retract (X) 
fact(X) ^ insert (X) 

To assure that the predicates insert and retract encode a proper update of the 
database, the following integrity constraints are also specified: 

• An inserted element should not belong to a given database: 
^ insert (X) A db(X) 

• A retracted element should belong to some database: 
^ retract (X) A -.db(X) 

The set of integrity constraints IC' is obtained by a straightforward transformation 
from IC: every occurrence of a database fact p in some integrity constraint is replaced by 
fact{py^. 

Example 8 (Example 1, revisited) Figure 2 contains the meta-program encoding Ex- 
ample 1 (the codes for Examples 2 and 3 are similar). 

As noted in Section 4.1, under any of the main semantics of abductive logic programing 
there is a one to one correspondence between repairs of the composed database VB and the 
Herbrand models of its encoding, the abductive meta theory T. Consequently, abduction 
can be used to compute repairs. In the following sections we introduce an abductive method 
for this purpose. 



19. Since our abductive system (see Section 4.3) will accept integrity constraints in a denial form, in case 
that the elements of IC' are not in this form, the Lloyd- Topor transformation (Lloyd & Topor, 1984) 
may also be applied here; we consider this case in Section 4.3.2. 
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% System definitions: 
defined (fact (_) ) 
defined(db(_)) 
abducible (insert (_)) 
abducible (retract (_) ) 

% The composer: 

fact(X) ^ db(X) A -.retract (X) 

fact(X) ^ insert (X) 

^ insert (X) A db(X) 

^ retract (X) A -.db(X) 

% The databases: 

db (teaches (cl ,nl) ) 

db(teaches(c2,n2)) 

db (teaches (c2 , n3) ) 

Y = Z ^ fact (teaches (X,Y)) A fact (teaches (X,Z)) 



%Di 

%D2 

%IC 



Figure 2: A meta-program for Example 1 



4.3 The Abductive Computational Model (The ^system) 

Below we describe the abductive system that will be used to compute database repairs. The 
^system (Kakas, Van Nuffelen, &: Denecker, 2001; Van Nuffelen & Kakas, 2001) is a tool 
combining abductive logic theories and constraint logic programming (CLP). It is a synthesis 
of the refutation procedures SLDNFA (Denecker & De Schreye, 1998) and ACLP (Kakas 
et al., 2000), together with an improved control strategy. The essence of the ^system is a 
reduction of a high level specification to a lower level constraint store, which is managed by a 
constraint solver. See http://www.cs.kuleuven.ac.be/~dtai/kt/ for the latest version 
of the system^'^. Below we review the theoretical background as well as some practical 
considerations behind this system. For more information, see (Denecker &: De Schreye, 
1998) and (Kakas, Van Nuffelen, & Denecker, 2001). 

4.3.1 Abductive Inference 

The input to the ^system is an abductive theory T = (V, A, IC), where IC consists of 
universally quantified denials. The process of answering a query Q, given by a conjunction 
of literals, can be described as a derivation for Q through rewriting states. A state is a pair 
{Q , ST), where Q, the set of goal formulae, is a set of conjunctions of literals and denials. 
During the rewriting process the elements in Q (the goals) are reduced to basic formulae 



20. This version runs on top of Sicstus Prolog 3.10.1 or later versions. 
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that are stored in the structure ST. This structure is called a store, and it consists of the 
following elements^^: 

• a set A that contains abducibles a(t). 

• a set A* that contains denials of the form \/X{-i— a(i) A Q) , where a(i) is an abducible. 
Such a denial may contain free variables. 

• a set £ of equalities and inequalities over terms. 

The consistency of £ is maintained by a constraint solver that uses the Martelli and Monta- 
nari unification algorithm (Martelli & Montanari, 1982) for the equalities and constructive 
negation for the inequalities. 

A state S = {Q , ST) is called consistent if Q does not contain false and ST is consis- 
tent (since A and A* are kept consistent with each other and with £, the latter condition 
is equivalent to the consistency oi £). A consistent state with an empty set of goals {Q = 0) 
is called a solution state. 

A derivation starts with an initial state (Gq , STq), where every element in STq is empty, 
and the initial goal, Qo, contains the query Q and all the integrity constraints IC of the 
theory T. Then a sequence of rewriting steps is performed. A step starts in a certain state 
Si = {Qi , STi), selects a goal in Qi, and applies an inference rule (see below) to obtain a 
new consistent state 5i+i. When no consistent state can be reached from Si the derivation 
backtracks. A derivation terminates when a solution state is reached, otherwise it fails (see 
Section 4.4 below for a demonstration of this process). 

Next we present the inference rules in the ^system, using the following conventions: 

• Q^ = g^ — {F}, where F is the selected goal formula. 

• OR and SELECT denote nondeterministic choices in an inference rule. 

• Q is a conjunction of literals, possibly empty. Since an empty conjunction is equivalent 
to true, the denial ^- Q with empty Q is equivalent to false. 

• If A, A*, and £ are not mentioned, they remain unchanged. 

The inference rules are classified in four groups, named after the leftmost literal in the 
selected formula (shown in bold). Each group contains rules for (positive) conjunctions of 
literals and rules for denials. 

1. Defined predicates: 

The inference rules unfold the bodies of a defined predicate. For positive conjunctions 
this corresponds to standard resolution with a selected clause, whereas in the denial 
all clauses are used because every clause leads to a new denial. 

D.l p(t) A Q: 

Let p{si) ^ Bi G "P (i = 1, . . . , n) be n clauses with p in the head. Then: 
Qi+i = ar U {t = si A Si A Q} OR . . . OR Gi+i = gr U {t = s„ A S„ A Q} 



21. The actual implementation of the ^system also contains a store for finite domain constraint expressions. 
This store is not needed for the application here, and hence it is omitted. 
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D.2 VX(^ p(t) A Q): _ _ 

Qi^^ = gr u {VX, F(^ t = sABAQ)\ there is p{s) ^ B eV with variables ¥} 

2. Negations: 

Resolving negation corresponds to 'switching' the mode of reasoning from a positive 
literal to a denial and vice versa. This is similar to the idea of negation-as-failure in 
logic programming. 

N.l ^p(t)AQ: 

gi+i = gru{Q,^p(t)} 

N.2 VX(^ ~'P(t) A Q) and t does not contain variables in X: 

gi+i = gr u {p{t)} OR g,+i = gr u {<- p(t), vX(^ q)} 

3. Abducibles: 

The first rule is responsible for the creation of new hypotheses. Both rules ensure that 
the elements in A are consistent with those in A*. 

A.l a(t) A Q: 

SELECT an arbitrary a{s) G A and define gi+i = g^ U {Q] U {s = t} 
OR gi+i = gr u {Q} U {VX(^ s = tAR)\ VX(V a(s) AR)e A*} U 

{^ {t = s)\ a{s) G Aj} and Aj+i = A^ U {a{t)} 

A.2 VX(^ a(t) A Q): _ 

gi+i = ^r u {VX(^ s = tAQ)\ a{s) G A^} and A*+i = A* U {VX(^ a{t) A Q)} 

4. Equalities: 

These inference rules isolate the (in)equalities, so that the constraint solver can eval- 
uate them. The first rule applies to equalities in goal formulae: 

E.l s = t AQ: 

Qi+i = Qi U {Q} and £"^+1 = £'j U {s = t} 

The following three rules handle equalities in denials. Which rule applies depends on 
whether s or t contain free or universally quantified variables. In these rules Q\X/i\ 
denotes the formula that is obtained from Q by substituting the term t for X. 

E.2 VX(^s = tAQ): 

If s and t are not unifiable then ^j+i = g~; 

Otherwise, let Eg be the equation set in solved form representing a most general 

unifier of s and t (Martelh & Montanari, 1982). ^i+i = Q' U {VX(^ EsAQ)}. 

E.3 VX, Y(^ X = t A Q) where i is a term not containing X: 

Qi+i = or u {VF(^ Q[X/t])} 

E.4 VX, Y(^ X = t A Q) where X is a free variable and X is the set of universally 
quantified variables in a term t: 

£i+i = £iU {yx{x / 1)} OR g,+i = gr u{x = t}u {vF(^ Q[x/t])}^\ 



22. In its first branch the inference E.4 explores the condition VX(X ^ t). In the second branch, the negation 
of this condition is explored. Here X is identical to t, for some values assigned to X. This is why in the 
second branch, the universally quantified variables X are turned into free variables which may appear 

freein Vy(^Q[X/t]). 
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As usual, one has to check for floundering negation. This occurs when the inference rule 
N.2 is applied on a denial with universally quantified variables in the negative literal ^p(t). 
Floundering aborts the derivation. 

An answer substitution 9, derived from a solution state S, is any substitution 9 of the 
free variables in S which satisfies £ (i.e. 9{£) is true) and grounds A. Note that, in case 
of an abductive theory without abducibles and integrity constraints, computed answers 
as defined by Lloyd (1987) are most general unifiers of £ and correct answers are answer 
substitutions as defined above. 

Proposition 10 (Kakas, Van Nuffelen, & Denecker, 2001) Let T = {V,A,TC) he an ab- 
ductive theory, Q a query, S a solution state of a derivation for Q, and 9 an answer 
substitution of S. Then the pair consisting of the ground abducible atoms 9{A{S)) and of 
the answer substitution 9 is an abductive solution for T and Q. 

4.3.2 Constraint Transformation to Denial Form 

Since the inference rules of the ^system are applied only on integrity constraints in denial 
form, the integrity constraints IC in the abductive theory T must be translated to this 
form. This is done by applying a variant of the Lloyd- Topor transformation (Lloyd &: 
Topor, 1984) on the integrity constraints (see Denecker & De Schreye, 1998). This is the 
same procedure as the well-known procedure used in deductive databases to convert a first 
order quantified query Q into a logically equivalent pair of an atomic query and a non- 
recursive datalog procedure. The transformation is defined as a rewriting process of sets of 
formulae: the initial set is {<— ^F\F G IC}, and the transformation is done by applying De 
Morgan and various distribution rules. New predicates and rules may be introduced during 
the transformation in order to deal with universal quantifications in denials. Below we 
illustrate the transformation in the case of the integrity constraints of the running example. 

Example 9 Consider the following extension of the integrity constraints of Example 1: 

IC = { yxyyyZ {teaches{X, Y) A teaches{X, Z) -^ Y = Z) , 
'iX {teacher {X) -^ 3Yteaches{Y,X)) }. 

Note that in addition to the original integrity constraint of Example 1, here we also demand 
that every teacher has to give at least one course. 

• Lloyd- Topor transformation on the first integrity constraint: 

(1) ^ -,\/X Vy VZ {^teaches{X, Y) V -^teaches{X, Z)\lY = Z) 

(2) ^ 3X 3F 3Z {teaches{X, Y) A teaches{X, Z) AY j^ Z) 

(3) VX Vy yZ (^ teaches{X, Y) A teaches{X, Z) AY ^ Z) 

• Lloyd- Topor transformation on the second integrity constraint: 

(1) ^ -^\/X {-^teacher{X) V 3Yteaches{Y,X)) 

(2) yX (^ teacher{X) A -^lYteaches{Y,X)) 
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(3) yx (^ teacher{X) A -^gives-Courses{X)) where gives-Courses is defined by: 

gives _courses{X) ^- 3Y teaches{Y,X) 

(4) VX (^ teacher{X) A ^gives_courses{X)) , and 

gives _courses{X) ^- teaches{Y, X) 

4.3.3 Control Strategy 

The selection strategy applied during the derivation process is crucial. A Prolog-like se- 
lection strategy (left first, depth first) often leads to trashing, because it is blind to other 
choices, and it does not result in a global overview of the current state of the computation. 
In the development of the ^system the main focus was on the improvement of the control 
strategy. The idea is to apply first those rules that result in a deterministic change of the 
state, so that information is propagated. If none of such rules is applicable, then one of 
the left over choices is selected. By this strategy, commitment to a choice is suspended 
until the moment where no other information can be derived in a deterministic way. This 
resembles a CLP-solver, in which the constraints propagate their information as soon as a 
choice is made. This propagation can reduce the number of choices to be made and thus 
often dramatically increases the performance. 

4.3.4 Implementation 

In this section we describe the structure of our implementation. Figure 3 shows a layered 
view. The upper-most level consists of the specific abductive logic theory of the integration 
task, i.e., the database information and the integrity constraints. This layer together with 
the composer form the abductive meta-theory (see Section 4.2) that is processed by the 
^system. 
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Figure 3: A schematic view of the system components. 
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As noted above, the composer consists of a meta-theory for integrating the databases 
in a coherent way. It is interpreted here as an abductive theory, in which the abducible 
predicates provide the information on how to restore the consistency of the amalgamated 
data. 

The abductive system (enclosed by dotted lines in Figure 3) consists of three main 
components: a finite domain constraint solver (part of Sicstus Prolog), an abductive meta- 
interpreter (described in the previous sections), and an optimizer. 

The optimizer is a component that, given a preference criterion on the space of the 
solutions, computes only the most-preferred (abductive) solutions. Given such a preference 
criterion, this component prunes 'on the fly' those branches of the search tree that lead to 
solutions that are worse than others that have already been computed. This is actually a 
branch and bound 'filter' on the solutions space, that speeds-up execution and makes sure 
that only the desired solutions will be obtained^^. If the preference criterion is a pre-order, 
then the optimizer is complete, that is, it can compute all the optimal solutions (more about 
this in Section 4.5). Moreover, this is a general-purpose component, and it may be useful 
not only for data integration, but also for, e.g., solving planning problems. 

4.3.5 Complexity 

It is well-known that in general, the task of repairing a database is not tractable, as there 
may be an exponential number of different ways of repairing it. Even in cases where in- 
tegrity constraints are assumed to be single-headed dependencies (Greco & Zumpano, 2000), 
checking whether there exists a <-repaired database in which a certain query Q is satis- 
fied, is in S2 . Checking if a fact is satisfied by all the <-repaired databases is in 112 (see 
Greco &: Zumpano, 2000). This is not surprising in light of the correspondence between 
computations of <-minimal repairs and computations of entailment relations defined by 
maximally consistent models (see Propositions 5-8), also known to be on the second level 
of the polynomial hierarchy. 

A pure upper bound for the ^system is still unknown, since - to the best of our knowl- 
edge - no complexity results on SLDNFA refutation procedure are available. 

4.4 Example: A Derivation of Repairs by the ^system 

Consider again Example 9. The corresponding meta-theory (assuming that the Lloyd- Topor 
transformation has been applied on it) is given in Figure 4. In this case, and in what follows, 
we shall assume that all variables in the denials are universally quantified, and so, in order 
to reduce the amount of notations, universal quantifiers are omitted from the denial rules. 

We have executed the code of Figure 4, as well as other examples from the literature in 
our system. As Theorem 2 in Section 4.5 guarantees, the output in each case is the set of 
the most preferred solutions of the corresponding problem. In what follows we demonstrate 



23. See also the third item of Note 2 (at the end of Section 4.4). 
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db (teacher (ni) ) 




db (teacher (n2)) 




db (teacher (ns) ) 




db (teaches (ci,ni)) 




db(teaches(c2,n2)) 




db(teaches(c2,n3)) 




^ fact(teaches(X,Y)) A fact (teaches (X,Z)) 


A (Y y^ Z) (icl) 


^- fact (teacher (X) ) A -igives_courses(X) 


(ic2) 


gives_courses(X) ^- fact (teaches (Y,X) ) 




fact(X) ^ db(X) A ^retract (X) 




fact(X) ^ insert (X) 




^ insert (X) A db(X) 


(composer- icl) 


^ retract (X) A ^db(X) 


(composer-ic2) 



Figure 4: A meta-theory for Example 9. 

how some of the most preferred solutions for the meta-theory above are computed. 

We follow one branch in the refutation tree, starting from the initial state {Qq,STq), 
where the initial set of goals is Qo = {'true', icl, ic2, composer-icl, composer-ic2}, 
and the initial store is STq = (0, 0, 0). Suppose that the first selected formula is 

Fi = icl = ^ fact(teaches(X,Y)) Afact(teaches(X, Z)) A (Y 7^ Z). 

Then, by D.2, 

Gi = Go \Fi u 

{ ^ db(teaches(X, Y)) A -■retract(teaches(X, Y)) A f act(teaches(X, Z)) A (Y / Z), 
^ insert(teaches(X, Y)) A f act(teaches(X, Z)) A (Y / Z) }, 

and ST I = S'Tq. Now, pick 

F2 = ^ db(teaches(X, Y)) A -■retract(teaches(X, Y)) A fact(teaches(X, Z)) A (Y / Z). 

Select db(teaches(X,Y)), unfold all the corresponding atoms in the database, and then, 
again by D.2, followed by E.2 and E.3, 

G2 = Gi \F2 U 

{ ^- -iretract(teaches(ci, ni)) A f act(teaches(ci, Z)) A (ni 7^ Z), 
^- -iretract(teaches(c2,n2)) A f act(teaches(c2, Z)) A (n2 7^ Z), 
<— -iretract(teaches(c2,n3)) A f act(teaches(c2, Z)) A (ns 7^ Z) }, 
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and still ST 2 = STi. Pick then the second denial among the new goals that were added 
to Q2- Denote this denial F^. Since -F3 starts with a negated literal, N.2 applies, and the 
derivation process splits here to two branches. The second branch contains 

^3 = ^2 \ -F3 U { ^ retract(teaches(c2,n2)), ^ fact(teaches(c2, Z)) A (n2 7^ Z) }, 

and still ST^ = ST 2- Choose now the first new goal, i.e., 

F4 = ^ retract(teaches(c2,n2)). 

Now, since A3 = 0, the only option is to add F4 to A3. Thus, by A. 2, 

Gi = Qz\ Fa and ST^ = (0, {F4}, 0). 

Assume, now, that we take the second new goal of Q^: 

F^ = <— f act(teaches(c2, Z)) A (n2 7^ Z). 

Following a similar process of unfolding data as described above, using db(teaches(c2, ns)), 
we end-up with 

^- retract(teaches(c2,n3)) A (n2 7^ ns). 

Selecting the negative literal (n2 7^ ns), N.2 applies again. The first branch quickly re- 
sults in failure after adding (n2 = ns) to £. The second branch adds ^- (n2 = ns) and 
retract(teaches(c2,n3)) to the set of goals. The former one is added to the constraint 
store, as (n2 7^ ns), and simplifies to true. Assume the latter is selected next. Let this be 
the i-th step. We have that by now Aj_i (the set of abducible predicates produced until the 
current step) is empty, thus the only option is to abduce retract(teaches(c2,n3)). Thus, 
by A.l, STi consists of: 

Aj = {retract(teaches(c2,n3))}. A* = {F4} = {^ retract(teaches(c2,n2))}, 

£i = Si^i, and 

Gi = Gi~i \ {retract(teaches(c2,n3))} U {^ teaches(c2,n2) = teaches(c2,n3)}. 

As the last goal is certainly satisfied, icl is resolved in this branch. 

Now we turn to ic2. So: 

Fj+i = ic2 = ^- f act(teacher(X)) A -igives_courses(X). 

The evaluation of -Fj+i for either x = ni or x = n2 is successful, so the only interesting case is 
when X = n^. In this case the evaluation leads to the goal gives_courses (X) . Unfolding this 
goal yields that f act(teaches(Y,ii3)) appears in the goal set. In order to satisfy this goal, 
it should be resolved either with one of the composer's rules (using D.l). The first rule (i.e., 
fact(X) <— db(X) A-i retract (X)) leads to a failure (since retract(teaches(c2,n3)) is 
already in A), and so the second rule of the composer, fact(X) ^- insert (X), must be 
applied. This leads to the abduction of insert(teaches(Y,n3)). By icl, Y 7^ ci and Y 7^ C2 
is derived^^. Also, composer-icl and composer-ic2 are satisfied by the current state, 
so eventually the solution state that is reached from the derivation path described here, 



24. One can verify that these constraints are indeed detected during the derivations process. We omit the 
details here in order to keep this example tractable. 
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contains the following sets: 

A = {retract(teaches(c2,n3)), insert(teaches(Y,n3))}, 

£: = {Y/ci,Y/c2}, 

which means retraction of teaches(c2,n3) and insertion of teaches(Y, ns) for some Y 7^ ci 
and Y 7^ C2. The other solutions are obtained in a similar way. 

Note 2 Below are some remarks on the above derivation process. 

1. The solution above contains a non-ground abducible predicate. This indeed is the 
expected result, since this solution resolves the contradiction with the integrity con- 
straint icl by removing the assumption that teacher ns teaches course C2. As a result, 
teacher 723 does not teach any course. Thus, in order to assure the other integrity con- 
straint (ic2), the solution indicates that n^ must teach some course (other than ci 
and C2). 

2. One possible (and realistic) explanation for the cause of the inconsistency in the 
database of Example 9 and Figure 4, is a typographic error. It might happen, for 
instance, that C2 was mistakenly typed instead of, say, C3, in teaches (c2,n3). In this 
case, the database repair computed above pinpoints this possibility (in our case, then, 
Y should be equal to 03)^^. This explanation cannot be explicitly captured, unless 
particular repairs with non-ground solutions are constructed, as indeed is the case 
here. While some other approaches that have been recently introduced (e.g.. Bravo & 
Bertossi, 2003; Call, Lembo, &: Rosati, 2003) properly capture cases such as those of 
Example 9, to the best of our knowledge, no other application of database integration 
has this ability. 

3. Once the system finds a solution that corresponds to a goal state Sg = (Qg, STg) with 
Qg = (d and STg = {Ag,A*g,£g), the <j-optimizer may be used such that whenever a 
state S = {Qs, {^s,^t,Ss)) is reached, and \Ag\ < |As|, the corresponding branch of 
the tree is pruned^^. 

4.5 Soundness and Completeness 

In this section we give some soundness and completeness results for the ^system, and relate 
these results to the model-based preferential semantics, considered in Section 3. 

In what follows we denote by T an abductive meta-theory (constructed as described 
in Section 4.2) for composing n given databases "DBi, . . . j'DBn- Let also ProcALP be some 
sound abductive proof procedure for T^'^. The following proposition shows that ProcALP 
provides a coherent method for integrating the databases that are represented by T. 

Proposition 11 Every abductive solution that is obtained by ProcALP for the query 'true' 
on a theory T, is a repair ofWDB. 



25. The variable Y is free and {Y/ca} is an answer substitution as it grounds A and satisfies £. 

26. As the size of As can only increase along the derivation, the state 5 cannot lead to a solution that is 
better than the one induced by Sg, and so the corresponding branch of the tree can indeed be pruned. 

27. That is, ProcALP is a process for computing only the abductive solutions of T, in the sense of Definition 14. 
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Proof: By the construction of T it is easy to see that aU the conditions that are Usted in 
Definition 5 are satisfied. Indeed, the first two conditions are assured by the integrity con- 
straints of the composer. The last condition is also met since by the soundness of ProcALP 
it produces abductive solutions A, for a query 'true' on T. Thus, by the second property 
in Definition 14, for every such solution Aj = (Insertj, Retractj) we have that V L) Ai \= 2C. 
Since V contains a data section with all the facts, it follows that D U Aj \= 2C, i.e. every 
integrity constraints follows from V U Insertj \ Retractj. □ 

As SLDNFA is a sound abductive proof procedure (Denecker & De Schreye, 1998), it 
can be taken as the procedure ProcALP, and so Proposition 11 provides a soundness theorem 
for the current implementation of the ^system. When an optimizer is incorporated in the 
^system, we have the following soundness result for the extended system: 

Theorem 1 (Soundness) Every output that is obtained by the query 'true ' on T, where 
the Asysteni is executed with a <c-optimizer [respectively, with an <i- optimizer], is a <c- 
preferred repair [respectively, an <i-preferred repair] ofUVB. 

Proof: Follows from Proposition 11 (since the ^system is based on SLDNFA which is a 
sound abductive proof procedure), and the fact that the <c-optimizer prunes paths that 
lead to solutions that are not <c-preferable. Similar arguments hold for systems with an 
<j-optimizer. □ 

Proposition 12 Suppose that the query 'true' has a finite SLDNFA-tree w.r.t. T. Then 
every <c-preferred repair and every <i-preferred repair of WDB is obtained by running T 
in the Asystem. 

Outline of proof: The proof that all the abductive solutions with minimal cardinality are 
obtained by the system is based on Theorem 10.1 of Denecker & De Schreye, 1998, where it 
is shown that SLDNFA°, which is an extension of SLDNFA, aimed for computing solutions 
with minimal cardinality, is complete; see Denecker & De Schreye, 1998, Section 10.1, for 
further details. Similarly, the proof that all the abductive solutions which are minimal 
w.r.t. set inclusion are obtained by the system is based on Theorem 10.2 of Denecker &: 
De Schreye, 1998, that shows that SLDNFA+, which is another extension of SLDNFA, 
aimed for computing minimal solutions w.r.t. set inclusion, is also complete; see Denecker 
& De Schreye, 1998, Section 10.2, for further details. 

Now, the ^system is based on the combination of SLDNFA" and SLDNFA_|-. More- 
over, as this system does not change the refutation tree (but only controls the way rules 
are selected), Theorems 10.1 and 10.2 in Denecker and De Schreye (1998) are applicable 
in our case as well. Thus, all the <c- and the <j-minimal solutions are produced. This in 
particular means that every <c-preferred repair as well as every < j-preferred repair of UVB 
is produced by our system. □ 

It should be noted that the last proposition does not guarantee that non-preferred repairs 
will not be produced (as this is not true in general). However, as the following theorem 
shows, the use of an optimizer excludes this possibility. 
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Theorem 2 (Completeness) In the notations of Proposition 12 and under its assumptions, 
the output of the execution of T in the Asystem together with a <c-optimizer [respectively, 
together with an <i-optimizer] is exactly \{WDB,<c) [respectively, \{WDB,<i)]. 

Proof: We shall show the claim for the case of <c', the proof w.r.t. <j is similar. 

Let (Insert, Retract) € l{U'DB,<c)- By Proposition 12, A = (Insert, Retract) is one of 
the solutions produced by the ^system for T. Now, during the execution of the system 
together with the <c-optimizer, the path that corresponds to A cannot be pruned from 
the refutation tree, since by our assumption (Insert, Retract) has a minimal cardinality 
among the possible solutions, so the pruning condition is not satisfied. Thus A will be 
produced by the <c-optimized system. For the converse, suppose that (Insert, Retract) is 
some repair of WDB that is produced by the <c-optimized system. Suppose for a con- 
tradiction that (Insert, Retract) ^ \{UVB,<c)- By the proof of Proposition 12, there is 
some A' = (Insert', Retract') G 1{UVB, <c) that is constructed by the ^system for T, and 
(Insert', Retract') <c (Insert, Retract). But |A'| < |A|, and so the <c-optimizer would prune 
the path of the A solution once its cardinality becomes bigger than |A'|. This contradicts 
our assumption that (Insert, Retract) is produced by the <c-optimized system. □ 

Note 3 The SLDNFA-resolution on which the ^tsystem is based is an extension of SLDNF- 
resolution (Lloyd, 1987) and coincides with it for logic programs with empty sets of ab- 
ducible predicates. SLDNF-resolution is complete only if its computation always termi- 
nates. SLDNFA inherits this property. This is the reason why the condition of a finite 
SLDNFA-tree is imposed in Proposition 12 and Theorem 2. Like SLDNF, the termination 
of SLDNFA can be guaranteed by imposing syntactic conditions on the program. We refer 
to (Verbaeten, 1999), where some conditions are proposed to guarantee the existence of a 
finite SLDNFA-tree. 

In the context of our paper, floundering would arise in the presence of unsafe integrity 
constraints (e.g., Vxp(x)). One way to eliminate this problem is to use a unary domain 
predicate dom, ranging over the objects of the database, and to add a range for each 
quantified variable in the integrity constraints, so that we obtain formulae of the form 
\/x{dom{x) -^ ipix)) and 3x(dom(x) A ip{x)). 

The following results immediately follow from the propositions above and those of Sec- 
tion 3 (unless explicitly said, the ^system is without optimizer). 

Corollary 1 Suppose that the query 'true ' has a finite SLDNFA refutation tree w.r.t. input 
theory T. Then: 

1. for every output (Insert, Retract) of the Asystem there is a classical model M of XC 
s.t. Insert = M* \ P and Retract = V\MK 

2. for every output (Insert, Retract) of the Asystem there is a 3-valued model N ofVUlC 
s.t. Insert = Insert and Retract = Retract. 

Corollary 2 In the notations of Corollary 1 and under its assumption, we have that: 
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1. for every output (Insert, Retract) that is obtained by running the Asysteni together with 
an <i-optimizer [respectively, together with a <c- optimizer], there is an <i-maximally 
consistent element [respectively, a <c-maximally consistent element] N in M s.t. 
Insert^ = Insert and Retract^ = Retract. 

2. for every <i-maximally consistent element [respectively, <c-maximally consistent el- 
ement] N in M there is a solution (Insert, Retract) that is obtained by running 
the Asystem together with an <i-optimizer [respectively, together with a <c-optimizer] 
s.t. Insert = Insert and Retract = Retract . 

The last corollaries show that the operational semantics, induced by the ^system, can 
also be represented by a preferential semantics, in terms of preferred models of the theory. 
The set TZlWDB, <) that represents the intended meaning of how to '<-recover' the database 
WDB, can therefore be obtained computationally, by the set 

{(Insert, Retract) | (Insert, Retract) is an output of the ^Isystem with an <-optimizer}, 

or, equivalently, it can be described in terms of preferred models of the theory, by the 
following set: 

{(Insert^, Retract^) | N is <-maximally consistent in M^^^}. 

4.6 Handling Specialized Information 

The purpose of this section is to demonstrate the potential usage of our system in more 
complex scenarios, where various kinds of specialized data are incorporated in the system. 
In particular, we briefly consider time information and source identification. We also give 
some guidelines on how to extend the system with capabilities of handling these kinds of 
information. 

4.6.1 TiMESTAMPED INFORMATION 

Many database applications contain temporal information. This kind of data may be di- 
vided in two types: time information that is part of the data itself, and time information 
that is related to database operations (e.g., records on database update time). Consider, 
for instance, birth_day(John,15/05/2001)ie/o5/2ooi- Here, John's date of birth is an instance 
of the former type of time information, and the subscripted data that describes the time in 
which this fact was added to the database, is an instance of the latter type of time infor- 
mation. 

In our approach, timestamp information can be integrated by adding a temporal theory 
describing the state of the database at any particular time point. One way of doing so 
is by using situation calculus. In this approach a database is described by some initial 
information and a history of events performed during the database lifetime (see Reiter, 
1995). Here we use a different approach, which is based on event calculus (Kowalski & 
Sergot, 1986). The idea is to make a distinction between two kinds of events, add_db and 
del_db, that describe the database modifications, and the composer-driven events insert 
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and retract that are used for constructing database repairs. In this view, the extended 
composer has the following form: 

holds_at(P,T) ^ initially(P) A -.clipped (0,P,T) 
holds_at(P,T) ^ add(P,E) A E<T A -.clipped (E,P,T) 

clipped(E,P,T) ^ del(P,C) A E<C, C<T 

add(P,T) ^ add_db(P,T) 

add(P,T) ^ insert (P,T) 

del(P,T) ^ del_db(P,T) 

del(P,T) ^ retract(P,T) 

^ insert(P,T) A retract(P,T) 
^ insert (P,T) A add_db(P,T) 
^ retract (P,T) A del_db(P,T) 

Note that in the above extended representation, the integrity constraints must be care- 
fully specified. Consider, e.g. the statement that a person can be born only on one date: 

^ holds_at(birth_day(P,Dl),T) A holds.at (birth_day(P,D2) ,T) A 01^^02 

The problem here is that to ensure consistency, this constraint must be checked at every 
point in time. This may be avoided by a simple rewriting that ensures that the constraint 
will be verified only when an event for that person occurs: 

ic(P,T) ^ holds_at(birth_day(P,Dl),T) A holds.at (birth_day(P,D2) ,T) A 01^^02 
^ add_db(birth_day(P,_) ,T) A NT = T+1 A ic(P,NT) 
^ insert (birth_day (P, _) ,T) A NT = T+1 A ic(P,NT) 
^ ic(P,0) 

Note 4 In the last example we have used temporal integrity constraints in order to resolve 
contradicting update events. Clearly, contradicting events do not necessarily yield a classi- 
cally inconsistent database, and so the role of such integrity constraints is to express possible 
events in terms of time and causation, and - if necessary - describe their consequence as a 
violation of consistency. 

Instead of using temporal integrity constraints and event calculus, one could repair 
a database with time-stamps by using some time-based criterion for making preferences 
among its repairs. For instance, denote by db(xi, . . . , x„)t that the data-fact db(xi, . . . , a;„) 
has a timestamp t, and suppose that (Insert, Retract) and (Insert', Retract') are two re- 
pairs of a database (P, IC). A time-based criterion for preferring (Insert, Retract) over 
(Insert', Retract') could state, e.g., that for every data-fact db(xi, . . . ,x„) and timestamps 
ti,t2 s.t. db(xi, . . . ,Xn)ti follows from PU Insert \ Retract and db(xi, . . . ,Xn)t2 follows from 
D U Insert' \ Retract', necessarily ti > t2. A more detailed treatment of this issue is outside 
the scope of this paper. 

The interested reader may refer, e.g., to (Sripada, 1995; Mareco &: Bertossi, 1999) for a 
detailed discussion on the use of logic programming based approaches to the specification 
of temporal databases. Such specifications can be easily combined with those for repairs, 
given above. 
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4.6.2 Keeping Track of Source Identities 

There are cases in which it is important to preserve the identity of the database from which 
a specific piece of information was originated. This is useful, for instance, when one wants 
to make preferences among different sources, or when some specific source should be fil- 
tered out (e.g, when the corresponding database is not available or becomes unreliable). 
This kind of information may be decoded by adding another argument to every fact, which 
denotes the identity of its origin. This requires minor modifications in the basic composer, 
since the composer controls the way in which the data is integrated. As such, it is the only 
component that can keep track on the source of the information. 

Suppose, then, that for every database fact we add another argument that identifies its 
source. I.e., db(X,S) denotes that X is a fact originated from a database S. The composer 
then has the following form: 

fact(X,S) ^ db(X,S) A ^retract (X) 
fact (X, composer) ^- insert (X) 
^ insert (X) A db(X,S) 
^ retract(X) A -idbCX.S) 

Note that the composer considers itself as an extra source that inserts brand new data 
facts. Now it is possible, e.g., to trace information that comes from a specific source, make 
preferences among different sources (by specifying appropriate integrity constraints), and 
filter data that comes from certain sources. The last property is demonstrated by the next 
rule: 

validFact(X) ^- fact(X,S) A trusted_source(S) 

where trusted_source enumerates all reliable sources of the data. 

Note that the last example of 'source identification' can be further extended in order to 
make preferences among different sources (and not only ignoring some unreliable sources). 
By introducing a new predicate, trust (Source, Amount), that attaches a certain level of 
reliability to each source, it is possible, in case of confiicts, to prefer sources with higher 
reliability as follows: 

^ fact(X,S) A db(X,So) A S / Sq A more_trusted(So,S) 
more_trusted(So,S) ^ trust (So,Ao) A trust (S, A) A Aq > A 

This method is particularly useful when the integrity constraint above acts as a func- 
tional dependency on specific facts. The following example (originally introduced in Sub- 
rahmanian, 1994) demonstrates this. 

Example 10 Consider the following simple scenario of 'target recognition', where three 
sensors of an autonomous vehicle, which have different degrees of reliability, should identify 
objects in the vehicle's neighborhood: 

trust (radar , 10) 
trust (gunchar , 8) 
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trust (speedometer ,5) 

db (observe (objectl,t72) , radar) 

db (observe (ob j ect 1 , t60) , gunchar ) 

db ( observe (objectl.t 80) .speedometer) 

^ fact(observe(0,Vi) ,S) A db(observe(0,V2) ,So) A St^Sq A more_trusted(So,S) 

As the radar has the highest rehabihty, its observation wiU be preserved. The observa- 
tions of the other sensors wih be retracted from the database. 

5. Discussion and an Overview of Related Works 

The interest in systems for coherent integration of databases has been continuously growing 
in the last few years (see, e.g, Olive, 1991; Baral et al., 1991, 1992; Revesz, 1993; Subrah- 
manian, 1994; Bry, 1997; Gertz & Lipeck, 1997; Messing, 1997; Lin & Mendelzon, 1998; 
Liberatore & Schaerf, 2000; Ullman, 2000; Greco & Zumpano, 2000; Greco et al., 2001; 
Franconi et al., 2001; Lenzerini, 2001, 2002; Arenas et al., 1999, 2003; Bravo & Bertossi, 
2003; Call et al., 2003, and many others). Already in the early works on this subject it be- 
came clear that the design of systems for data integration is a complex task, which demands 
solutions to many questions from different disciplines, such as belief revision, merging and 
updating, reasoning with inconsistent information, constraint enforcement, query process- 
ing and - of course - many aspects of knowledge representation. In this section we shall 
address some of these issues. 

One important aspect of data integration systems is how concepts in the independent 
(stand-alone) data-sources and those of the unified database are mapped to each other. 
A proper specification of the relations between the source schemas and the schema of the 
amalgamated data exempts the potential user from being aware where and how data is 
arranged in the sources. One approach for this mapping, sometimes called global-centric 
or global- as-view (Ullman, 2000), requires that the unified schema should be expressed in 
terms of the local schemas. In this approach, every term in the unified schema is associated 
with a view (alternatively, a query) over the sources. This approach is taken by most of the 
systems for data integration, as well as ours. The main advantage of this approach is that 
it induces a simple query processing strategy that is based on unfolding of the query, and 
uses the same terminology as that of the databases. This indeed is the case in the abductive 
derivation process, defined in Section 4.3.1. The other approach, sometimes called source- 
centric or local- as-view (used, e.g., in Bertossi et al., 2002), considers every source as a view 
over the integrated database, and so the meaning of every source is obtained by concepts 
of the global database. In particular, the global schema is independent of the distributed 
ones. This implies, in particular, that an addition of a new source to the system requires 
only to provide local definitions and not necessarily involves changes in the global schema. 
The main advantage of the latter approach is, therefore, that it provides a better setting for 
maintenance. For a detailed discussion on this topic, see (Ullman, 2000; Lenzerini, 2001; 
Call et al., 2002; Van Nuffelen et al., 2004). More references and a survey on different ap- 
proaches to data integration appear in the papers of Batini, Lenzerini, and Navathe (1986), 
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Rahm and Bernstein (2001), and Lenzerini (2002). 

Another major issue that has to be addressed is the abihty of data integration systems 
to properly cope with dynamically evolving worlds. In particular, the domain of discourse 
should not be fixed in advance, and information may be revised on a regular basis. The last 
issue is usually handled by methods of belief revision (Alchourron et al., 1995; Gardenfors & 
Rott, 1995) and nonmonotonic reasoning. In the context of belief revision it is common to 
make a distinction between revisions of integrity constraints and changes in the sets of the 
data-facts, since the two types of information have different nature and thus may require 
different approaches for handling dynamic changes. When the set of integrity constraints 
is given in a clause form, methods of dynamic logic programing (Alferes et al., 2000, 2002) 
may be useful for handling revisions. As noted in (Alferes et al., 2002), assuming that each 
local database is consistent (as in our case), dynamic logic programing (together with a 
proper language for implementing it, like LUPS (Alferes et al., 2002)) provides a way of 
avoiding contradictory information, and so this may be viewed as a method of updating a 
database by a sequence of integrity constraints that arrive at different time points. 

When the types of changes are predictable, or can be characterized in some sense, tem- 
poral integrity constraints (in the context of temporal databases) can be used in order to 
specify how to treat new information. This method is also useful when the revision criteria 
are known in advance (e.g., 'in case of collisions, prefer the more recent data', cf. Section 
4.6.1). See, e.g., (Sripada, 1995; Mareco & Bertossi, 1999) for a detailed discussion on tem- 
poral integrity constraints and temporal databases in a logic programming based formalisms. 

The second type of revisions (i.e., modifications of data- facts) is obtained here through 
the (preferred) repairs of the unified database, which induce corresponding modifications of 
data- facts. A repair is usually induced by a method of restoring (or assuring) consistency of 
the amalgamated database by a minimal amount of change. As in our case, the minimization 
criterion is often determined by the aspiration to remain 'as close as possible' to the set of 
the collective information. This is a typical kind of a repair goal, and the standard ways of 
formally expressing it are by enumeration methods, such as the following^^: 



• 



Minimizing the Hamming distance between the (propositional) models of the unified 
database and its repairs (Liberatore & Schaerf, 2000), or minimizing the distance be- 
tween the corresponding three- valued interpretations (de Amo et al., 2002) according 
to a suitable generalization of Hamming distance. 

Minimizing the symmetric distance between the sets of consequences of the corre- 
sponding databases (Arenas, Bertossi, & Chomicki, 1999; Arenas, Bertossi, &: Kifer, 
2000; Bertossi, Chomicki, Cortes, &: Gutierrez, 2002) or, equivalently, minimizations 
in terms of set inclusion (Greco & Zumpano, 2000). 

When the underlying data is prioritized, the corresponding quantitative information 
is also considered in the computations of distances (see, for instance, the work of 
Liberatore & Schaerf, 2000). 



28. See also (Gertz & Lipeck, 1997, Section 5) for a discussion on repair strategies. 
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Various ways of computing (preferred/minimal) repairs are described in the literature, 
among which are proof-theoretical (deductive) methods (Bertossi &: Schwind, 2002; de Amo 
et al., 2002), abductive methods (Kakas & Mancarella, 1990a; Inoue & Sakama, 1995; 
Sakama & Inoue, 1999, 2000), and algorithmic approaches that are based on computations 
of maximal consistent subsets (Baral et al., 1991, 1992), or use techniques from model-based 
diagnosis (Gertz &; Lipeck, 1997). A common approach is to view a database as a logic 
program, and to adopt standard techniques of giving semantics to logic programs in or- 
der to compute database repairs. For instance, stable-model semantics on disjunctive logic 
programs is used for computing repairs in (Greco & Zumpano, 2000; Greco et al., 2001; 
Franconi et al., 2001; Arenas et al., 2003), and resolution-based procedures for integrating 
several annotated databases are introduced by Subrahmanian (1990, 1994). As it follows 
from Section 4, the application introduced here is also based on an extended resolution 
strategy, applied on logic programs that may contain negation-as-failure operators and ab- 
ducible predicates. 

As repairing a database means in particular elimination of contradictions, reasoning with 
inconsistent information has been a major challenge for data integration systems. First, it 
is important to note in this respect that not every formalism for handling inconsistency 
is acceptable in the context of databases, even if the underlying criterion for handling 
inconsistency is the same as one of the repair goals mentioned above. The following example 
demonstrates such a case: 

Example 11 (Arenas, Bertossi, & Chomicki, 1999) Consider the following (inconsistent) 
database: VB = {{p, q},{^{p A q)})- In the approach of Lin (1996), for instance, pV q 
may be inferred as the repaired database, following a strategy of minimal change. However, 
in this approach none of p, q, and ^{p A q) holds in the repaired database. In particular 
(since in (Lin, 1996) there is no distinction between data- facts and integrity constraints), 
the integrity constraint {^{p A q)} itself cannot be inferred, which violates the intended 
meaning of an integrity constraint in databases. 

Many techniques for consistency enforcement and repairs of constraint violations have 
been suggested, among which are methods for resolving contradictions by quantitative con- 
siderations, such as 'majority vote' (Lin & Mendelzon, 1998; Konieczny & Pino Perez, 2002) 
or qualitative ones (e.g., defining priorities on different sources of information or preferring 
certain data over another, as in Benferhat, Cayrol, Dubois, Lang, & Prade, 1993, and Arieli, 
1999). Another common method of handling inconsistent (and incomplete) information is 
by turning to multi-valued semantics. Three-valued formalisms such as the one considered 
in Section 3 are used as a semantical basis of paraconsistent methods to construct database 
repairs (de Amo, Carnielli, & Marcos, 2002) and are useful in general for pinpointing in- 
consistencies (Priest, 1991). Other approaches use lattice-based semantics to decode within 
the language itself some meta-information, such as confidence factors, amount of belief for 
or against a specific assertion, etc. These approaches combine corresponding formalisms of 
knowledge representation, such as annotated logic programs (Subrahmanian, 1990, 1994; 
Arenas et al., 2000) or bilattice-based logics (Fitting, 1991; Arieli & Avron, 1996; Mess- 
ing, 1997), together with non-classical refutation procedures (Fitting, 1989; Subrahmanian, 
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1990; Kifer & Lozinskii, 1992) that allow to detect inconsistent parts of a database and 
maintain them. 

6. Summary and Future Work 

In this paper we have developed a formal declarative foundation for rendering coherent 
data, provided by different databases, and presented an application that implements this 
approach. Like similar applications (e.g., Subrahmanian, 1994; Bertossi, Arenas, &: Ferretti, 
1998; Greco &: Zumpano, 2000; Liberatore & Schaerf, 2000), our system mediates among 
the sources of information and also between the reasoner and the underlying data. 

Composition of several data-sources is encoded by meta-theories in the form of abductive 
logic programs, and it is possible to extend these theories by providing meta-information 
on the data-facts, such as time-stamps and source identities. Moreover, since the reasoning 
process of the system is based on a pure generalization of classical refutation procedures, 
no syntactical embedding of first-order formulae into other languages, nor any extension of 
two- valued semantics, is necessary. 

Due the inherent modularity of the system, each component is independent and can be 
modified to meet different needs. Thus, for instance, the underlying solver may be replaced 
with any other solver that is capable of dealing with the meta-theory, and any improvement 
of the optimizer will affect the whole system and its efficiency, regardless the nature of 
its input. Also, the way of keeping data coherent is encapsulated in the component that 
integrates the data (i.e., the composer). This implies, in particular, that no input from the 
reasoner nor any other external policy for making preferences among conflicting sources is 
compulsory in order to resolve contradictions. 

As we have shown, the operational semantics for inconsistent databases, induced by 
the ^system, is strongly related to (multi-valued) preferential semantics. As preferential 
semantics provides the background for many non-monotonic and paraconsistent formalisms 
(e.g., Shoham, 1988; Priest, 1989, 1991; Kifer & Lozinskii, 1992; Arieh & Avron, 1996; 
Arieli, 1999, 2003), this implies that the ^system may be useful for reasoning with general 
uncertain theories (not necessarily in the form of databases). 

It is important to note that our composing system inherits the functionality of the 
underlying solver. The outcome of this is flexibility, modularity, simple interaction with 
different sources of information, and the ability to reason with any set of first-order for- 
mulae of integrity constraints^^ . To the best of our knowledge no other application of data 
integration has this ability. 

There are several directions for further exploration. First, as we have already noted, two 
more phases, which have not been considered here, might be needed for a complete process 
of data integration: 



29. Provided, of-course, that the constraints do not lead to floundering. 
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a) translation of difference concepts to a unified ontology, and 

b) integration of integrity constraints. 

So far, formalisms for dealing with the first item (e.g., Lenzerini, 2001, 2002; Van Nuffelen 
et al., 2004) mainly focus on the mutual relations between the global schema and the source 
(local) schemas, in particular how concepts of each ontology map to each other. On the 
other hand, formalisms for handling the second item concentrate on nonmonotonic reason- 
ing for dynamically evolving (and mutually inconsistent) worlds. A synthesis of the main 
ideas behind these approaches, and incorporating them in our system, is a major challenge 
for future work. 

Another important issue that deserves attention is the repair of inconsistency in the 
context of deductive databases with integrity constraints and definitions of predicates, of- 
ten called view predicates. We refer to (Denecker, 2000) for a sketch on how this may be 
done. This kind of data may be further combined with (possibly inconsistent) temporal 
information, (partial) transactions, and (contradictory) update information. 

Finally, since different databases may have different information about the same predi- 
cates, it is reasonable to use some weakened version of the closed word assumption as part 
of the integration process (for instance, an assumption that something is false unless it is 
in the database, or unless some other database has some information about it). 
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